AITopics | theoretical perspective

Collaborating Authors

theoretical perspective

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Is RLHF More Difficult than Standard RL? A Theoretical Perspective

Neural Information Processing SystemsDec-27-2025, 04:14:37 GMT

Reinforcement learning from Human Feedback (RLHF) learns from preference signals, while standard Reinforcement Learning (RL) directly learns from reward signals. Preferences arguably contain less information than rewards, which makes preference-based RL seemingly more difficult. This paper theoretically proves that, for a wide range of preference models, we can solve preference-based RL directly using existing algorithms and techniques for reward-based RL, with small or no extra costs. Specifically, (1) for preferences that are drawn from reward-based probabilistic models, we reduce the problem to robust reward-based RL that can tolerate small errors in rewards; (2) for general arbitrary preferences where the objective is to find the von Neumann winner, we reduce the problem to multiagent reward-based RL which finds Nash equilibria for factored Markov games under a restricted set of policies. The latter case can be further reduce to adversarial MDP when preferences only depend on the final state. We instantiate all reward-based RL subroutines by concrete provable algorithms, and apply our theory to a large class of models including tabular MDPs and MDPs with generic function approximation. We further provide guarantees when K-wise comparisons are available.

name change, standard rl, theoretical perspective, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.63)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective

Neural Information Processing SystemsDec-26-2025, 23:59:57 GMT

Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs), particularly when dealing with complex tasks involving mathematics or reasoning. Despite the enormous empirical success, the underlying mechanisms behind CoT and how it unlocks the potential of LLMs remain elusive. In this paper, we take a first step towards theoretically answering these questions. Specifically, we examine the expressivity of LLMs with CoT in solving fundamental mathematical and decision-making problems. By using circuit complexity theory, we first give impossibility results showing that bounded-depth Transformers are unable to directly produce correct answers for basic arithmetic/equation tasks unless the model size grows super-polynomially with respect to the input length. In contrast, we then prove by construction that autoregressive Transformers of constant size suffice to solve both tasks by generating CoT derivations using a commonly used math language format. Moreover, we show LLMs with CoT can handle a general class of decision-making problems known as Dynamic Programming, thus justifying their power in tackling complex real-world tasks. Finally, an extensive set of experiments show that, while Transformers always fail to directly predict the answers, they can consistently learn to generate correct solutions step-by-step given sufficient CoT demonstrations.

name change, revealing, theoretical perspective, (7 more...)

Neural Information Processing Systems

Genre: Research Report (0.59)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Kernel Stein Discrepancy thinning: a theoretical perspective of pathologies and a practical fix with regularization

Neural Information Processing SystemsDec-26-2025, 10:10:18 GMT

Stein thinning is a promising algorithm proposed by (Riabiz et al., 2022) for post-processing outputs of Markov chain Monte Carlo (MCMC). The main principle is to greedily minimize the kernelized Stein discrepancy (KSD), which only requires the gradient of the log-target distribution, and is thus well-suited for Bayesian inference. The main advantages of Stein thinning are the automatic remove of the burn-in period, the correction of the bias introduced by recent MCMC algorithms, and the asymptotic properties of convergence towards the target distribution. Nevertheless, Stein thinning suffers from several empirical pathologies, which may result in poor approximations, as observed in the literature. In this article, we conduct a theoretical analysis of these pathologies, to clearly identify the mechanisms at stake, and suggest improved strategies. Then, we introduce the regularized Stein thinning algorithm to alleviate the identified pathologies. Finally, theoretical guarantees and extensive experiments show the high efficiency of the proposed algorithm. An implementation of regularized Stein thinning as the kernax library in python and JAX is available at https://gitlab.com/drti/kernax.

kernel stein discrepancy, practical fix, theoretical perspective, (6 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.83)

Add feedback

Revisiting Scalarization in Multi-Task Learning: A Theoretical Perspective

Neural Information Processing SystemsDec-26-2025, 09:47:55 GMT

Linear scalarization, i.e., combining all loss functions by a weighted sum, has been the default choice in the literature of multi-task learning (MTL) since its inception. In recent years, there is a surge of interest in developing Specialized Multi-Task Optimizers (SMTOs) that treat MTL as a multi-objective optimization problem. However, it remains open whether there is a fundamental advantage of SMTOs over scalarization. In fact, heated debates exist in the community comparing these two types of algorithms, mostly from an empirical perspective. To approach the above question, in this paper, we revisit scalarization from a theoretical perspective.

multi-task learning, revisiting scalarization, scalarization, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.75)
Information Technology > Artificial Intelligence > Machine Learning (0.66)

Add feedback

Dimensionality reduction: theoretical perspective on practical measures

Neural Information Processing SystemsDec-25-2025, 18:05:47 GMT

Dimensionality reduction plays a central role in real-world applications for Machine Learning, among many fields. In particular, metric dimensionality reduction where data from a general metric is mapped into low dimensional space, is often used as a first step before applying machine learning algorithms. In almost all these applications the quality of the embedding is measured by various average case criteria. Metric dimensionality reduction has also been studied in Math and TCS, within the extremely fruitful and influential field of metric embedding. Yet, the vast majority of theoretical research has been devoted to analyzing the worst case behavior of embeddings and therefore has little relevance to practical settings.

dimensionality reduction, reduction, theoretical perspective, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Sparse Neural Networks

Neural Information Processing SystemsDec-23-2025, 19:28:28 GMT

The lottery ticket hypothesis (LTH) states that learning on a properly pruned network (the winning ticket) has improved test accuracy over the original unpruned network. Although LTH has been justified empirically in a broad range of deep neural network (DNN) involved applications like computer vision and natural language processing, the theoretical validation of the improved generalization of a winning ticket remains elusive. To the best of our knowledge, our work, for the first time, characterizes the performance of training a pruned neural network by analyzing the geometric structure of the objective function and the sample complexity to achieve zero generalization error. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned, indicating the structural importance of a winning ticket. Moreover, as the algorithm for training a pruned neural network is specified as an (accelerated) stochastic gradient descent algorithm, we theoretically show that the number of samples required for achieving zero generalization error is proportional to the number of the non-pruned weights in the hidden layer. With a fixed number of samples, training a pruned neural network enjoys a faster convergence rate to the desired model than training the original unpruned one, providing a formal justification of the improved generalization of the winning ticket. Our theoretical results are acquired from learning a pruned neural network of one hidden layer, while experimental results are further provided to justify the implications in pruning multi-layer neural networks.

lottery ticket win, neural network, theoretical perspective, (8 more...)

Neural Information Processing Systems

Genre: Contests & Prizes (1.00)

Industry: Leisure & Entertainment > Gambling (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)

Add feedback

Dimensionality reduction: theoretical perspective on practical measures

Yair Bartal, Nova Fandina, Ofer Neiman

Neural Information Processing SystemsOct-3-2025, 06:12:18 GMT

Dimensionality reduction plays a central role in real world applications for Machine Learning, among many fields.

artificial intelligence, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (0.67)
Asia > Middle East > Israel (0.28)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.63)

Add feedback

A Theoretical Perspective for Speculative Decoding Algorithm

Neural Information Processing SystemsMay-27-2025, 20:10:45 GMT

Transformer-based autoregressive sampling has been the major bottleneck for slowing down large language model inferences. One effective way to accelerate inference is Speculative Decoding, which employs a small model to sample a sequence of draft tokens and a large model to validate. Given its empirical effectiveness, the theoretical understanding of Speculative Decoding is falling behind. This paper tackles this gap by conceptualizing the decoding problem via markov chain abstraction and studying the key properties, output quality and inference acceleration, from a theoretical perspective. Our analysis covers the theoretical limits of speculative decoding, batch algorithms, and output quality-inference acceleration tradeoffs.

inference, speculative decoding algorithm, theoretical perspective

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Machine Learning (0.87)

Add feedback

Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective

Neural Information Processing SystemsMay-26-2025, 23:13:40 GMT

Pre-trained large language models (LLMs) based on Transformer have demonstrated striking in-context learning (ICL) abilities. With a few demonstration input-label pairs, they can predict the label for an unseen input without any parameter updates. In this paper, we show an exciting phenomenon that SVD-based weight pruning can enhance ICL performance, and more surprising, pruning weights in deep layers often results in more stable performance improvements than in shallow layers. However, the underlying mechanism of those findings still remains an open question. To reveal those findings, we conduct an in-depth theoretical analysis by presenting the implicit gradient descent (GD) trajectories of ICL and giving the mutual information based generalization bounds of ICL via full implicit GD trajectories.

large language model, machine learning, natural language, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

Bridging Contrastive Learning and Domain Adaptation: Theoretical Perspective and Practical Application

Quintana, Gonzalo Iñaki, Vancamberg, Laurence, Jugnon, Vincent, Desolneux, Agnès, Mougeot, Mathilde

arXiv.org Artificial IntelligenceJan-28-2025

This work studies the relationship between Contrastive Learning and Domain Adaptation from a theoretical perspective. The two standard contrastive losses, NT-Xent loss (Self-supervised) and Supervised Contrastive loss, are related to the Class-wise Mean Maximum Discrepancy (CMMD), a dissimilarity measure widely used for Domain Adaptation. Our work shows that minimizing the contrastive losses decreases the CMMD and simultaneously improves class-separability, laying the theoretical groundwork for the use of Contrastive Learning in the context of Domain Adaptation. Due to the relevance of Domain Adaptation in medical imaging, we focused the experiments on mammography images. Extensive experiments on three mammography datasets - synthetic patches, clinical (real) patches, and clinical (real) images - show improved Domain Adaptation, class-separability, and classification performance, when minimizing the Supervised Contrastive loss.

artificial intelligence, domain adaptation, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2502.00052

Country:

Europe > France (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback